Performance of a Parallel Matrix Multiplication Routine on Intel iPSC/860
نویسندگان
چکیده
The performance of a parallel matrix-matrix-multiplication routine with the same functionality as DGEMM of BLAS3 was tested for different numbers of nodes on a 32-node iPSC/860. The routine was then tuned for maximum performance on this particular computer system. Small changes in the original code lead to substantially higher performance and in all tested configurations there is a critical matrix size n≈50.np, the number of processors, above which Intel's non-blocking isend is more efficient than the blocking csend. This shows that special tuning for a single machine pays off for large matrices. 1.
منابع مشابه
Computation Time In BMR
Figure 13: Running time of the assembly DGEMM routine vs that of the C routine of the S-method coupled with DGEMM on single processor. MINDIM=100 for the S-method. Strassen's algorithm has been presented and compared with other parallel matrix multiplication algorithms. On the Intel iPSC/860, the BMR-Strassen method coupled with assembly BLAS routines o ers the fastest approach to matrix multip...
متن کاملPerformance Experiments and Optimizations of PDE Sparse Solvers on Hypercubes,
In this report we present the results of experiments with the parallel sparse matrix solver of the Parallel Ellpack System. 1bree different hypercube parallel machines are used to compare and optimize its performance. After a brief description of the parnIlel sparse matrix solver and a presentation of the machine parameters and features. the measurements of performance of the sparse solver on t...
متن کاملThe Conjugate Gradient Method for Large Sparse Matrices on the Intel Ipsc/860 Hypercube
For large sparse unstructured matrices, the critical parts of the Conjugate Gradient method on the iPSC/860 are the inter-processor communications needed for the matrix-vector multiplication and the vector-updates. In this work several implementations are tested and discussed in search for an optimal algorithm. They di er in distribution of the matrix and the various vectors over the processorg...
متن کاملEarly Experience With the Intel iPsc/860 At Oak Ridge National Laboratory
This report summarizes the early experience in using the Intel iPSC/SSO parallel supercomputer at Oak Ridge National Laboratory. The hardware and software are described in some detail, and the machine’s performance is studied using both simple computational kernels and a number of complete applications programs.
متن کاملA PERFORMANCE STUDY OF SPARSE CHOLESKY FACTORIZATION ON INTEL iPSC/860
The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices, for example, codes from Harwell Subroutine Library [4] and Sparspak [7]. However, there is a lack of such efficient codes on parallel machines in general, and distributed memory machines in particul...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 20 شماره
صفحات -
تاریخ انتشار 1994